(12) 



UK Patent Application „., GB 2 237 908„^ 



(43) Date of A pubJicatbn ISiyS.1991 



(21) Applkstkui Ite 9020776^ 


(61) 


IMTCL* 






(22) Oatfi of ftlino 24.09.1990 






(52) 


UKCL (&ftionK) 


(30) Priority data 




J^iiA A&JD 

G4A ACf- AMP 


(31) 0925227 (32) 0a.11«1989 (33) GB 






red 

(56) 


Dooumanlt citad 
Nona 






(71) Appfksnt 


<58> 


Raid of search 


British Aeroapaeo PubHe Umltad Company 


UK CL rEdrtkm K) G4A ACf^ ACL AUP AIIV. K4T 


(lnoorporat«d In t»Mi Unftad Kingdom) 




TOQG tOGX TCJA TCXX 




INTCL* G06F 


11 Strand, tondon« WC2N &IT« Unttad Kingdom 






(72) Inventor 






Stavan MaxwaS Nricaa 






(74) Agant and/or Addresa for SarvioQ 






P B Roonay 






Corporal* Inta&eotuaf Propatty Oapartmant, 






Brtllah Aaroapaoa Plo. PO Box 87, 

Royal Aaroapaoa Eat famborough» GUI 4 6YU, 










UnKad KbiBdom 







(54) Parallet processing pf data 

(57) In paraltol processing oT data, the data is organtsad into a two dtmenebna! array having at least two tows (5a. 5b. 5c» 
5d) and at feast two transvarse finking columns (Sa. 6b. Gc, 6d). first high level data processing is caftied out by ftfst 
processing means on the rows or on the oolumr^. corner turning is candied out on ttw first prooessed data to turn It from eald 
rows into s^ columns or vice versa^ and second high level data prooessirtg is carried out by second processing means on 
the corner turned data in said columns or in said rows, the first processed data in said rows or columns being stored* 
before or after comer turning, in separate memories (3a« 3b. 3a 3d) associated one wKh each row (5a. 5b. 5c. 5d} or 
column (6a. 6b. 6c. Sd). 
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This print Incorporates oorrections made under Section 1 1 7(1} of the Patents Act 1977. 
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Method and Apparatus for Parallel Processing Data 
This invention relates to a Method and Apparatus for 
parallel processing data, particularly, but not exclusively, 
suitable for the processing of signal and /or iznage data. 

Data is commonly stored serially row by row on a direct 
access bulk storage peripheral such as a disc file unit. Such 
data may be transferred to or from the disc file in blocks which 
are stored at random on the disc. Thus if it is required to 
access the columns of a matrix stored row by row, many blocks 
will require retrieval from the disc to access the column 
elements. This is time consuming and inefQcient. 

One way of reorganising the stored data is to transpose the 
data so that the stored blocks contain data in serial columoa 
order instead of serial row order. This reorganisation is termed 
^corner turning'* Conventionally such comer turning has been 
implemented by writing the row ordered data into a single large 
memory and then reading it out in column order using a column 
ordered^ address generator. However this knovnoi technique has 
the disadvantage of causing a communications bottleneck. 

There is thus a need for a generally improvekl method and 
apparatus for parallel processing of data which is more efficient 
and which causes less of a communications bottlenecfk than the 
aforementioned conventional techniques. 

According to one aspect of the present invention there is 
provided a method of parallel processing data, in which the data 
is organised Into a two dimensional array having at least two 
rows and at least two transverse linking columns, first high 
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level data processing is carried out on tbe rows or on the 
columns, corner turning is carried out on the first processed 
data to turn it from said rows into said columns or vice versa, 
and second high level data processing is carried out on the 
comer turned data in said columns or in said rows, with the 
first processed data in said rows or columns being stored, 
before or after comer turning, in separate memories associated 
one with each row or column. 

Thus the comer turning memory ia distributed between two 
or more column processing elements* By operating all the 
memories in parallel the communications bottleneck caused by a 
single large comer turning memory is overcome. 

Prefembly said first high level data processing is carried 
out on each of said rows of data, the comer turning is carried 
out on tlie processed row data to turn it into column ordered 
data and said second hi^ level data processing is carried out on 
the column ordered data* 

Conveniently said first high level processing is carried out 
by one row processor per row, said second high level processing 
ifi carried out by one column processor per column and the 
processed row data is stored, in sedd separate memories 
associated one with each row, before comer turning. 

Advantageously said ftrat hi^ level processing is carried 
out by one row processor per row, said second high level 
processing is carried out by one column processor per c olumn 
and the processed row data is stored in separate memoi4es 
associated one with each column after comer turning. 
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Preferably corner turning is carried out by feeding the 
processed data from each row in sequence, in parallel into a 
shift register associated one with each column to form a series of 
data sets and shifting the series of data sets from each shift 
register into the associated memory in column order, fron 
. whence the column ordered data can be read by the associated 
column processor. 

Conveniently said first high level processing is carried out 
on each of said columns of data» the comer turning is carried 
out on the processed column data to turn it into row ordered 
data and said second high level processing is carried out on the 
row ordered data* 

Advantageously said first high level processing is carried 
out by one column processor per column, said second high level 
processing is carried out by one row processor per row and the 
processed coliunn data is stored after comer ttiming in said 
separate memories associated one with each row. 

Preferably the comer turning is carried out by feeding the 
processed data from each column in sequence, in parallel into a 
shift register associated one with each row to form a series of 
data sets and shifting the series of data sets from each shift 
register into the associated memory in row order, from whence 
the row ordered data can be read by the associated row 
processor. 

Conveniently one dimensional Fast Fourier Transforms are 
carried out on the data in each processor. 



According to a second aspect of the present invention there 
Is provided apparatus for the parallel processing of data* 
including means for organising data into a two diniensional array 
having at least two rows and at least two transverse linking 
columns, first processing means for canying out first hi^ level 
data processing on the rows or the columns , comer turning 
means for carrying out comer turning on the first processed 
data to turn it from said rows into said columns or vice versa, 
second processing means for carrying out second hl^ level data 
processing on the comer turned data in said columns or in said 
rows, and at least two separate memories assodated one mth 
each row or column, which memories are located and operable to 
store the first processed data in said rows or columns before or 
after comer turning. 

Preferably the first and second processing means are data 
processors located one in each row and column and wherein the 
comer turning means includes a plurality of shift registers 
located one in each column. 

Conveniently the array has at least two substantially 
parallel rows, vdth the first processing means data processors 
being located respectively one at each input end of each row, 
with the output end of each row being connected to the shift 
register of one column and with the rows being connected 
intermediate the row ends to the shift register of another 
column, and wherein the second processing meax^ data 
processors are located respectively one at each output end of 
each column to receive the output from the associated shift 
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register. 

Advantageou&ly the memories are located one in each row 
between the associated row data processor and the row 
connections to the column shift register most remote from the 
output ends of the rows. 

Preferably the memories are located one in each column 
between the associated column data processor input and the 
associated column shift register output. 

Conveniently each data processor is operable to carry out 
one dimensional Fast Fourier Transforms. 

For a better understanding of the present invention, and to 
show how the same may be carried into effect, referanee will now 
be made, by way of example, to the accompanying drawings. In 
which: 

Figure 1 is a block diagram of apparatus according to a 
first embodiment of the Invention for parallel processing data. 

Figure 2 is a diagram illustrating an arrangement of shift 
registers to achieve comer turning of data using the method 
according to the present invention and the apparatus of Figure 
1. 

Figure 3 is a diagram illustrating the relative timing of the 
control signals used by the shift register arrangement of Figuie 
2. 

Figure 4 is a view similar to that of Figure 1 showing a 
block diagram of an apparatus for parallel processing data 
according to a second embodiment of the invention. 
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As shown in the accompanying drawings, the apparatus and 
method of the invention for parallel processing of data, such as 
signal and/or image data, basically involves organising the data 
into a two dimensional array having at least two rows and at 
least two transverse linking cohinms. In the embodiment 
illustrated in Figures 1 and 4 there are four such rows 5a, 5b, 
5c and 5d and four such columns 6a, 6b, 6c, 6d* First high 
level data processing is carried out on the rows,. 5a, 5b, 5c, 5d 
or on the columns 6a, 6b, 6c, 6d, comer turning is carried out 
on the first processed data to turn it from the rows into the 
columns or vice versa and second high level data processing is 
carried out on the comer turned data in the mlumns or in the 
TOWS. The first processed data in the rows 5a, 5b, 5c, 5d or in 
tl&e ootumns 6a, 6b, 6c, 6d is stored before or after comer 
turning, in separate memories 3a, 3b, 3c, 3d associated one with 
each row or column. 

In the embodiment illustrated tn Figure 1 the first Mf^ 
level data praoessing is carried out on each of the rows 5a, 5bt 
5c, 5d by one row processor la, lb, Ic, Id and the second high 
level processing is carried out on the column ordered data by 
one cohimn processor 4a, 4b, 4c, 4d. The comer turning is 
carried out on the processed row data by a phipahty of shift 
registers 2a, 2b, 2c, 2d located respectively one in each column 
6a, 6b, 6c, 6d. The processed row data is stored in separate 
memories 3a, 3b, 3c, 3d associated one with each column, after 
comer turning* Although in the illustrated embodiments of 



Figures 1 and 4 four rows and four columns have been shown « it 
is of course to be understood that the method and apparatus of 
the invention is operable with at least two rows and at least two 
columns. 

Comer turning is carried out by feeding the processed data 
from each row 5a» 5b, 5c, 5d in sequence, in parallel into the 
associated shift register associated one v?ith each column to form 
a series of data sets. The series of data sets for each shift 
register 2a, 2b, 2c, 2d is shifted into the associated memory 3a, 
3b, 3c, 3d in column order, from whence the column ordered 
data can be read by the associated cplumn processor 4a, 4b, 4c, 
or 4d. 

The row and column processors have a high functionality, 
performing complete operations on sef^nts of data (for example 
256 samples) rather than elementary operations on single data 
samples. In the Figure 1 embodiment each column processor 4a, 
4b, 4c, 4d has an associated memory 3a, 3b, 3c, 3d into which 
the corner turned data is stored prior to column processing. 
Thus the total memory required to hold the data is distributed 
between all the column processors. 

This provides a high bandwidth ocmmunications structure 
connecting a parallel array of row processors with a concurrently 
operating array of column processors. E3ctremely high 
performance may be obtained without oomumunication bottlenecks, 
with the addition of further rows and colxunns and thus further 
row processors and column processors, automatically increasing 
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the data Input/output bandwidth. One dimensional data may be 
processed by organising it in a two dimensional foni prior to 
processing. Data in three or more dimensions may be -rooessed 
by first organising the data into two dimensional arrays of data. 

Although not illustrated, the first high level processii^ 
could be carried out on each of the columns of data, the comer 
turning carried out on the processed column data to turn it into 
row ordered data and the second high level processing carried 
out on the row ordered data. In other words the sequence of 
Figure 1 in which data is inputted at 7 and outputted at 8 could 
be reversed. In such an alternative, the first hi^ level 
processing would be carried out by the column processors, the 
second high level processing carried out by the row processors 
and the processed ooltuxin data stored, after comer turning, in 
the separate memories associated one with each row. The Figure 
4 embodiment illustrates such alternative ^iparatus in which the 
memories are associated with the row processors althou^ in the 
illustrated Figure 4- embodiment the data iiqmt 7 is to the rows 
and the data output 8 is from the columns. 
Example 1 

The example algorithm used in the method of the invention 
is the two dimensional Fast Fourier Transform (FFT). This is a 
wen known algorithm which may be implemented by first applying 
a one dimensional FFT to ail the rows (5a» 5b, 5c, 5d) of the 
two dimensional data array followed by applying a one 
dimensional FFT to all the columns (6a, 6b, 6c, 6d) of the 
resultant data array. In this example a 64 by 64 point array of 
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data as shown in Table 1 is to be transformed by processor 
apparatus according to the first embodiznent of the invention as 
illustrated in Figure 1. 

In this particular case the row processors la, lb, Ic, Id 
and the column processors 4a, 4b, 4c, 4d all perform identical 
functions which is a 64 point one dimension FFT. 
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The data was processed four rows at a time. The first 
four rows of data were passed through the four row processors 
la, lb, Ic, Id which perform 64 point FFTs on the data rows 
5a, 5b, 5c, 5d respectively* The row processors output their 
results in the same order as the data went in. The first set of 
data to emerge was {0,0} from row processor la, {1,0} from row 
processor lb, {2,0} from row processor Ic and {3,0} from row 
processor Id. This set of data was loaded in parallel into the 
first shift register 2a, then shifted out and placed in memory 
3a* The next set of data from the row processors [{0.1} {1,1} 
{2,1} {3,1}] was loaded onto the next shift register 2b, then 
shifted out into memory 3b. In a similar way memory 3c will 
receive the data [{0,2} {1,2} {2,2} {3,2}} and memory 3d will 
receive data [{0,3} {1,3} {2,3} {3,3}]. The next set of data to 
emerge from the row processors, [{0,4} {1,4} {2,4} {3,4}) was 
loaded by the first shift register 2a into memory 3d. After rows 
5a to 5d had been processed the next four rows were processed 
starting at 4.0 then 5,0 then 6,0 and 7,0- This procedure 
continued with the row processors processing each set of four 
rows of data in turn until the last set of data [{60,63} {61,63} 
{62,63} {63,63}] had been loaded into memory 3d. 

Now memory 3a contains all the data from every fourth 
column of the data array starting at column Sa (i.e. columns 1, 
5, 9....] and meim^ries 3c and 3d contain all the data from every 
fourth column starting at columns 6c and 6d respectively. 
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The column processors 4a, 4b, 4c, and 4d can now read the 
column orientated data out of the memories 3a, 3b, 3c and 3d 
respectively and process each coliunn in turn- The column 
processors 4a, 4b » 4c and 4d first process columns 6a, 6b, 6c, 
6d (0, 1, 2 and 3) respectively, followed by successive columns 
(4, 5, 6 and 7} and so on until all the columns of data have 
been processed. The column processors ivill perform 64 point 
PFTs on each column of data in the example two dimensional FFT 
algorithm. ' The data from the prooeesing apparatus appears in 
column order at the output of the column processors. If desired 
a further parallel processing apparatijus may be added to the 
output of the column processors 4a, 4b, 4c, 4d to convert the 
column ordered data back to row ordered form. 

By using a shift register structure to perform the comer 
turning the memory elements required to hold the comer tamed 
data before column processing are distributed evenly between the 
four column processors 4a, 4b, 4c, 4d. The four memories 3a, 
3b, 3c, 3d are accessed concurrently, thereby improving data 
throughput compared with a conventional single memory 
asrrangement • 

Further to illustrate the method of the present invention a 
specific implementation of the shift register stmcture will now be 
described. With reference to Figure 2 an array of four-bit shift 
registers were connected to the outputs of the row processors 
la, lb, Ic, Id and to the inputs of the column memories 3a, 3b, 
3c, 3d. The data output from the row processor la is 
represented by AO, al, where AO is the least significant 
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bit of the data word, Al the next bit and bo on- Similarly BO, 

Bl, B2, is the output from row prooessor lb; CO, CI-.-. 

the output from row processor Ic and DO, Dl.... the output 
from row processor Id. 

The input to column memory 3a is represented by EO, El, 

E2, where EC is the least significant bit. El the next bit and 

so on- FO, Fl,....; GO, Gl, and HO, HI are the 

inputs to column memories 3b, 3c and 3d respectively. 

Each four bit shift register is controlled by two signals, LD 
and SH. LD causes the data at the parallel Input (PO, PI, P2, 
P3} of the shift register to be parallel loaded hito the register. 
SH causes the data within the shift register to be shifted down 
one position. The serial (shifted) data appears at the output, 
SOUT, of the shift register. Each vertical bank of shift 
registers in Figure 2 have common LD and SH control signals. 
For exanqple the first bank (column 2a) which generates the 
comer turned signals EO, El,.... for column memory 3a uses the 
signals SHE and LDE. The relative timing of the shift register 
control signals is shown in Figure 3. 

The operation of the shift register "comer turning" 
structure wiU now be described. As the Hrst set of data 
emerges from the row processors la, lb, Ic, Id during clock 
period TO (see Figure 3) the LDE (load shift r^^ter 2a) signal 
is activated. This causes the data from aU the row processors 
to be loaded into the first column of shift registers (column 2a 
in Figure 2). If a 16 bit word is used as the output from each 
row processor then there wiU be sixteen shift registers in the 
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column » i.e. one shift register for each bit. Since there are 
four row processors la, lb, Ic, Id in this example each shift 
register 2a, 2b, 2c, 2d will be four bits long. Once the data 
has been loaded into the first column of shift registers (column 
2a} the data from row processor la (AO, Al»...«) is immediately 

available at the outputs of those shift registers (EO, El, )• 

During the next clock period Tl the next set of data 
emerges from the row processors and is loaded into the second 
column of shift registers (column 2b) by signal LDF, At the 
same time the SHE line is activated shifting the data in the first 
column of shift registers (column 2a) down one place so that the 
data previously loaded from row processor lb is available at 
their outputs. 

On the next clock pulse (T2) the third set of data from the 
row processor is loaded into the third column of shift registers 
(column 2c) by signal LDG. Signals SHE and SHF cause the 
data in the first and second column of shift registers (columns 
2a and 2b) respectively) to be shifted down one place. 

Now data CO, Cl,...« loaded in time slot TO is available at 
the output of the first column of shift registers (column 2a}, 

data BO, Bl, (loaded in time slot T2) is available at the 

output of the third column of shift registers (column 2c) • 

This procedure continues with data from the row processors 
being loaded into one column of shift registers while the data in 
the other three columns of shift registers is shifted down one 
place. The output from the shift registers constitutes the 
required comer turned data for each column processor 4a, 4b, 
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4c, 4d which is loaded into its associated memory 3a, 3b « 3c, 
3d. 

Although the foregoing Example 1 has been described in 
terms of the apparatus for parallel processing of data according 
to t}» embodiment of Figure 1, it is to be understood that a 
similar method can be carried out with the apparatus for the 
pairallel processing of data as illustrated in the second 
embodiment of Figure 4. The primary difference between the 
two embodiments is that in the embodiment of Figure 4 the 
memories 3a, 3b, 3c and 3d are associated with the row 
processors la, lb, Ic and ld« Additionally although in the two 
illustrated embodiments tbe data input has been shown as to the 
row processors la, lb, Ic and Id, with the output from the 
column processors 4a, 4b, 4c and 4d, it is, however, to be 
understood that the data input could be to the column processors 
and the data output from the row processors. 

Additionally, although four rows Sa, 5b, 5c and 5d and 
four columns 6a, 6b, 6c and 6d have been described and 
illustrated with respect to the embodin^nts of Figures 1 and 4 a 
mitttmnm of two such rows and two such columns may be 
provided or more than four such rows and columns if desired. 
In any event each row will include one row processor and each 
column will include one shift register and ccdumn processor. 
One memory will be provided for each column or row. The 
output ends of the rows 5a» Sb, Sc and 5d are connected, in the 
illustrated embodiments, to the shift register 2d of the column 
6d. The rows are also connected intermediate the row ends at 



specific spadngs there along to the shift register 2a of the 
column 6a » to the shift register 2b of the column 6b and to the 
shift register 2c of the column 6c. In the Figure 1 embodiment 
the memories 3a, 3b, 3c and 3d are connected respectively one 
between each of the shift registers and the column processors. 
In the ngure 4 embodiment the memories are connected one 
between each of the row processors and the first column 
connection 6a. Each row processor and each colximn processor is 
operable to carry out one dimensional Fast Fourier Transforms. 

The column processors or row processors each or all may be 
capable of performing one specific function only, which 
preferably may be selected from several possible predefined 
modes of operation. 
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CLAIMS 

1« A method of parallel processing data. In which the data is 
organised into a two dimensional array having at least two rows 
and at least two transverse linking columns » first hi^ level data 
processing is carried out on the rows or on the columns, comer 
turning is carried out on the first processed data to turn it from, 
said rows into said columns or vice versa, and second high level 
data processing is carried out on the comer turned data in said 
columns or in said rows, with the first processed data in said 
rows or coluoms being stored, before or after comer turning, in 
separate memories associated one with each row or column. 
2. A method according to Claim 1, in which said first high 
level data processing is carried out on each of said rows of 
data, the comer turning is carried out on the processed row 
data to turn it into column ordered data and said second higji 
level data processing is carried out on the column ordered data. 
3* A method according to Claim 2, in which said first high 
level processing is carried out by one row processor per row, 
said second high level processing is carried out by one column 
processor per column and the processed row data is stored in 
said separate memories associated one with each row, before 
comer turning. 

4. A method according to Claim 2, in which said first hi^ 
level processing is carried out by one row processor per row, 
said second hi|^ level processing is carried out by one column 
processor per column and the processed row data is stored in 
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separate memories associated one with each column » after comer 
turning. 

5* A method according to Claim 4, in which comer turning is 
carried out by feeding the processed data from each row in 
sequence, in peraUei into a shift register associated one with 
each cohimn to form a series of data sets and shifting the series 
of data sets from each shift register into the associated memorjr 
in column order, from whence the column ordered data can be 
read by the associated colxunn processor. 

6. A method according to Claim 1, in which said first high 
level processing is carried out on each of said columns of data, 
the corner turning is carried out on the processed column data 
to turn it into row ordered data and said second high level 
processing is carried out on the row ordered data. 

7. A method according to Claim 6, in which said first H^ffr 
level processing is carried out by one row processor per row 
and the processed column data Is stored after comer turning in 
said separate memories associated one with each row. 

8. A method according to Claim 7« in which the comer turning 
is carried out by feeding the processed data from each column in 
sequenoet in parallel into a shift register associated one with 
each row to form a series of data sets from each shift register 
into the associated memory in row order, from whence the row 
ordered data can be read by the associated row processor. 

9* A method according to Claim 3, Claim 4 or Claim 7, in 
which one dimensional Fast Fourier Transforms are carried out 
on the data in each processor. 



10. A method according to any one of Claims 1 to 9, in which 
one dimensional data is processed by first organising it into a 
two dimensional array^ 

11. A method according to any one <3i Claims 1 to 10» in which 
the data to he parallel prooessed is signal and/or ima^ data. 

12. A method according to any one of Claims 1 to 9, in which 
data in three or more dimensions is processed by first oi^anising 
it into two dimensional arrays of data. 

13. A method of parallel prooessiz^ data substantially as 
hereinbefore described with reference to Figures 1 to 3 or 
Figure 4 of the accompanying drawings. 

14. Apparatus for the parallel processing of data* iniduding 
means for organising data into a two dimensional array having at 
least two rows and at least two transverse Itnldng columns, first 
processixxg means for carrying out first hi^ level data 
processing on the rows or the columns, corner turning means for 
carrying out comer tuming on the first processed data to turn 
it from said rows into said columns or vice versa, second 
pracessing means for carrying out second hi^ level processing 
on the corner turned data in said columns or in said rows, and 
at least two separate memories associated one with each row or 
column, which memories are located and operahle to store the 
first processed data in said rows or oahimns before or after 
comer turning. 

15. Apparatus according to Claim 14, wherdn the first and 
second processing means are data processors located one in each 
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row and column and wherein the corner turning means includes a 
phirallty of shift registers located one in each cohunn. 

16. Apparatus according to Claim 15, wherein the array has at 
least two substantially parallel rows» vrith the first processing 
means data processors being located respectively one at each 
input end of each row, vrith the output end of each row being 
connected to the shift register of one column and with the rows 
being connected intenradiate the row ends to the shift register 
of another column, and wherein the second processing means 
data processors are located respectively one at each output end 
of each column to receive the output from the associated shift 
register. 

17. Apparatus according to Claim 16, wherein the memories are 
located one in each row between the associated row data 
processor and the row connections to the column shift register 
most remote from the output ends of the rows. 

18* Apparatus according to Claim 16, wherein the memories are 
located one in each column between the associated column data 
processor input and the associated column shift register output. 
19. Apparatus according to any one of Claims 15 to 18, wherein 
each data processor is operable to carry out one dimensional Fast 
Fourier Transforms* 

20* Apparatus for the parallel processing of data, substantially 
as hereinbefore described and as illustrated in Figure 1 or 
Figure 4 of the accompanying drawings. 
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